{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Intro to Thinc's `Model` class, model definition and methods\n",
    "\n",
    "Thinc follows a functional-programming approach to model definition. Its approach is especially effective for **complicated network architectures**, and use cases where different data types need to be passed through the network to reach specific subcomponents. This notebook shows how to compose Thinc models and how to use the `Model` class and its methods."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install \"thinc>=8.0.0a0\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Thinc provides a variety of [layers](https://thinc.ai/docs/api-layers), functions that create `Model` instances. Thinc tries to avoid inheritance, preferring function composition. The `Linear` function gives you a model that computes `Y = X @ W.T + b` (the function is defined in `thinc.layers.linear.forward`)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy\n",
    "from thinc.api import Linear, zero_init\n",
    "\n",
    "n_in = numpy.zeros((128, 16), dtype=\"f\")\n",
    "n_out = numpy.zeros((128, 10), dtype=\"f\")\n",
    "\n",
    "model = Linear(nI=n_in.shape[1], nO=n_out.shape[1], init_W=zero_init)\n",
    "nI = model.get_dim(\"nI\")\n",
    "nO = model.get_dim(\"nO\")\n",
    "print(f\"Initialized model with input dimension nI={nI} and output dimension nO={nO}.\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Models support **dimension inference from data**. You can defer some or all of the dimensions."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "model = Linear(init_W=zero_init)\n",
    "print(f\"Initialized model with no input/ouput dimensions.\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "X = numpy.zeros((128, 16), dtype=\"f\")\n",
    "Y = numpy.zeros((128, 10), dtype=\"f\")\n",
    "model.initialize(X=X, Y=Y)\n",
    "nI = model.get_dim(\"nI\")\n",
    "nO = model.get_dim(\"nO\")\n",
    "print(f\"Initialized model with input dimension nI={nI} and output dimension nO={nO}.\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The `chain` function wires two model instances together, with a feed-forward relationship. Dimension inference is especially helpful here."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from thinc.api import chain, glorot_uniform_init\n",
    "\n",
    "n_hidden = 128\n",
    "X = numpy.zeros((128, 16), dtype=\"f\")\n",
    "Y = numpy.zeros((128, 10), dtype=\"f\")\n",
    "\n",
    "model = chain(Linear(n_hidden, init_W=glorot_uniform_init), Linear(init_W=zero_init),)\n",
    "model.initialize(X=X, Y=Y)\n",
    "nI = model.get_dim(\"nI\")\n",
    "nO = model.get_dim(\"nO\")\n",
    "nO_hidden = model.layers[0].get_dim(\"nO\")\n",
    "print(f\"Initialized model with input dimension nI={nI} and output dimension nO={nO}.\")\n",
    "print(f\"The size of the hidden layer is {nO_hidden}.\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We call functions like `chain` [**combinators**](https://thinc.ai/docs/api-layers#combinators). Combinators take one or more models as arguments, and return another model instance, without introducing any new weight parameters. Another useful combinator is `concatenate`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from thinc.api import concatenate\n",
    "\n",
    "model = concatenate(Linear(n_hidden), Linear(n_hidden))\n",
    "model.initialize(X=X)\n",
    "nO = model.get_dim(\"nO\")\n",
    "print(f\"Initialized model with output dimension nO={nO}.\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The `concatenate` function produces a layer that **runs the child layers separately**, and then **concatenates their outputs together**. This is often useful for combining features from different sources. For instance, we use this all the time to build [spaCy](https://spacy.io)'s embedding layers.\n",
    "\n",
    "Some combinators work on a layer and a numeric argument. For instance, the `clone` combinator creates a number of copies of a layer, and chains them together into a deep feed-forward network. The shape inference is especially handy here: we want the first and last layers to have different shapes, so we can avoid providing any dimensions into the layer we clone. We then just have to specify the first layer's output size, and we can let the rest of the dimensions be inferred from the data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from thinc.api import clone\n",
    "\n",
    "model = clone(Linear(), 5)\n",
    "model.layers[0].set_dim(\"nO\", n_hidden)\n",
    "model.initialize(X=X, Y=Y)\n",
    "nI = model.get_dim(\"nI\")\n",
    "nO = model.get_dim(\"nO\")\n",
    "print(f\"Initialized model with input dimension nI={nI} and output dimension nO={nO}.\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can apply `clone` to model instances that have child layers, making it easy to define more complex architectures. For instance, we often want to attach an activation function and dropout to a linear layer, and then repeat that substructure a number of times. Of course, you can make whatever intermediate functions you find helpful."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from thinc.api import Relu, Dropout\n",
    "\n",
    "def Hidden(dropout=0.2):\n",
    "    return chain(Linear(), Relu(), Dropout(dropout))\n",
    "\n",
    "model = clone(Hidden(0.2), 5)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Some combinators are unary functions: they take only one model. These are usually **input and output transformations**. For instance, the `with_array` combinator produces a model that flattens lists of arrays into a single array, and then calls the child layer to get the flattened output. It then reverses the transformation on the output."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from thinc.api import with_array\n",
    "\n",
    "model = with_array(Linear(4, 2))\n",
    "Xs = [model.ops.alloc2f(10, 2, dtype=\"f\")]\n",
    "model.initialize(X=Xs)\n",
    "Ys = model.predict(Xs)\n",
    "print(f\"Prediction shape: {Ys[0].shape}.\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The combinator system makes it easy to wire together complex models very concisely. A concise notation is a huge advantage, because it lets you read and review your model with less clutter – making it easy to spot mistakes, and easy to make changes. For the ultimate in concise notation, you can also take advantage of Thinc's **operator overloading**, which lets you use an infix notation. Operator overloading can lead to unexpected results, so you have to enable the overloading explicitly **in a contextmanager**. This also lets you control how the operators are bound, making it easy to use the feature with your own combinators. For instance, here is a definition for a text classification network:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from thinc.api import add, chain, concatenate, clone\n",
    "from thinc.api import with_array, reduce_max, reduce_mean, residual\n",
    "from thinc.api import Model, Embed, Maxout, Softmax\n",
    "\n",
    "nH = 5\n",
    "\n",
    "with Model.define_operators({\">>\": chain, \"|\": concatenate, \"+\": add, \"**\": clone}):\n",
    "    model = (\n",
    "        with_array(\n",
    "            (Embed(128, column=0) + Embed(64, column=1))\n",
    "            >> Maxout(nH, normalize=True, dropout=0.2)\n",
    "        )\n",
    "        >> (reduce_max() | reduce_mean())\n",
    "        >> residual(Relu() >> Dropout(0.2)) ** 2\n",
    "        >> Softmax()\n",
    "    )"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The network above will expect a list of arrays as input, where each array should have two columns with different numeric identifier features. The two features will be embedded using separate embedding tables, and the two vectors added and passed through a `Maxout` layer with layer normalization and dropout. The sequences then pass through two pooling functions, and the concatenated results are passed through 2 `Relu` layers with dropout and residual connections. Finally, the sequence vectors are passed through an output layer, which has a `Softmax` activation."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "\n",
    "## Using a model\n",
    "\n",
    "Define the model:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from thinc.api import Linear, Adam\n",
    "import numpy\n",
    "\n",
    "X = numpy.zeros((128, 10), dtype=\"f\")\n",
    "dY = numpy.zeros((128, 10), dtype=\"f\")\n",
    "\n",
    "model = Linear(10, 10)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Initialize the model with a sample of the data:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "model.initialize(X=X, Y=dY)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Run the model over some data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "Y = model.predict(X)\n",
    "Y"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Get a callback to backpropagate:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "Y, backprop = model.begin_update(X)\n",
    "Y, backprop"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Run the callback to calculate the gradient with respect to the inputs. If the model has trainable parameters, gradients for the parameters are accumulated internally, as a side-effect."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "dX = backprop(dY)\n",
    "dX"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The `backprop()` callback only increments the parameter gradients, it doesn't actually change the weights. To increment the weights, call `model.finish_update()`, passing it an optimizer:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "optimizer = Adam()\n",
    "model.finish_update(optimizer)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can get and set dimensions, parameters and attributes by name:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "dim = model.get_dim(\"nO\")\n",
    "W = model.get_param(\"W\")\n",
    "model.attrs[\"hello\"] = \"world\"\n",
    "model.attrs.get(\"foo\", \"bar\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can also retrieve parameter gradients, and increment them explicitly:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "dW = model.get_grad(\"W\")\n",
    "model.inc_grad(\"W\", dW * 0.1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Finally, you can serialize models using the `model.to_bytes` and `model.to_disk` methods, and load them back with `from_bytes` and `from_disk`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "model_bytes = model.to_bytes()"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
